NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs

Liu, Xiaozhen; Huang, Yicong; Lin, Xinyuan; Kumar, Avinash; Alsudais, Sadeem; Li, Chen (June 2025, ACM SIGMOD)

Free, publicly-accessible full text available June 22, 2026
Dynamic-State-Estimation-Based Cyber Attack Detection for Inverter-Based Resources

https://doi.org/10.1109/PESGM52003.2023.10252357

Kumar, Avinash; Lin, Yuzhang; Huang, Heqing; Lu, Xiaonan; Zhao, Yue (July 2023, IEEE PES General Meeting)
Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs

https://doi.org/10.1145/3698832

Liu, Xiaozhen; Huang, Yicong; Lin, Xinyuan; Kumar, Avinash; Alsudais, Sadeem; Li, Chen (December 2024, Proceedings of the ACM on Management of Data)

Data analytics tasks are often formulated as data workflows represented as directed acyclic graphs (DAGs) of operators. The recent trend of adopting machine learning (ML) techniques in workflows results in increasingly complicated DAGs with many operators and edges. Compared to the operator-at-a-time execution paradigm, pipelined execution has benefits of reducing the materialization cost of intermediate results and allowing operators to produce results early, which are critical in iterative analysis on large data volumes. Correctly scheduling a workflow DAG for pipelined execution is non-trivial due to the richer semantics of operators and the increasing complexity of DAGs. Several existing data systems adopt simple heuristics to solve the problem without considering costs such as materialization sizes. In this paper, we systematically study the problem of scheduling a workflow DAG for pipelined execution, and develop a novel cost-based optimizer called Pasta for generating a high-quality schedule. The Pasta optimizer is not only general and applicable to a wide variety of cost functions, but also capable of utilizing properties inherent in a broad class of cost functions to improve its performance significantly. We conducted a thorough evaluation of developed techniques on real-world workflows and show the efficiency and efficacy of these solutions.
more » « less
Full Text Available
The Marketing and Perceptions of Non-Tobacco Blunt Wraps on Twitter

https://doi.org/10.1080/10826084.2023.2280572

Rhee, Joshua U; Huang, Yicong; Soroosh, Aurash J; Alsudais, Sadeem; Ni, Shengquan; Kumar, Avinash; Paredes, Jacob; Li, Chen; Timberlake, David S (November 2023, Substance Use & Misuse)

Full Text Available
How the experience of California wildfires shape Twitter climate change framings

https://doi.org/10.1007/s10584-023-03668-0

Ko, Jessie_W Y; Ni, Shengquan; Taylor, Alexander; Chen, Xiusi; Huang, Yicong; Kumar, Avinash; Alsudais, Sadeem; Wang, Zuozhi; Liu, Xiaozhen; Wang, Wei; et al (January 2024, Climatic Change)

Abstract Climate communication scientists search for effective message strategies to engage the ambivalent public in support of climate advocacy. The personal experience of wildfire is expected to render climate change impacts more concretely, pointing to a potential message strategy to engage the public. This study examined Twitter discourse related to climate change during the onset of 20 wildfires in California between the years 2017 and 2021. In this mixed method study, we analyzed tweets geographically and temporally proximal to the occurrence of wildfires to discover framings and examined how frequencies in climate framings changed before and after fires. Results identified three predominant climate framings: linking wildfire to climate change, suggesting climate actions, and attributing climate change to adversities besides wildfires. Mean tweet frequencies linking wildfire to climate change and attributing adversities increased significantly after the onset of fire. While suggesting climate action tweets also increased, the increase was not statistically significant. Temporal analysis of tweet frequencies for the three themes of tweets showed that discussion increased after the onset of a fire but persisted typically no more than 2 weeks. For fires that burned for longer periods of more than a month, external events triggered climate discussions. Our findings contribute to identifying how the personal experience of wildfire shapes Twitter discussion related to climate change, and how these framings change over time during wildfire events, leading to insights into critical time points after wildfire for implementing message strategies to increase public engagement on climate change impacts and policy.
more » « less
Full Text Available
Raven: Accelerating Execution of Iterative Data Analytics by Reusing Results of Previous Equivalent Versions

https://doi.org/10.1145/3597465.3605219

Alsudais, Sadeem; Kumar, Avinash; Li, Chen (January 2023, HILDA Workshop at SIGMOD 2023)

Using GUI-based workflows for data analysis is an iterative process. During each iteration, an analyst makes changes to the workflow to improve it, generating a new version each time. The results produced by executing these versions are materialized to help users refer to them in the future. In many cases, a new version of the workflow, when submitted for execution, produces a result equivalent to that of a previous one. Identifying such equivalence can save computational resources and time by reusing the materialized result. One way to optimize the performance of executing a new version is to compare the current version with a previous one and test if they produce the same results using a workflow version equivalence verifier. As the number of versions grows, this testing can become a computational bottleneck. In this paper, we present Raven, an optimization framework to accelerate the execution of a new version request by detecting and reusing the results of previous equivalent versions with the help of a version equivalence verifier. Raven ranks and prunes the set of prior versions to quickly identify those that may produce an equivalent result to the version execution request. Additionally, when the verifier performs computation to verify the equivalence of a version pair, there may be a significant overlap with previously tested version pairs. Raven identifies and avoids such repeated computations by extending the verifier to reuse previous knowledge of equivalence tests. We evaluated the effectiveness of Raven compared to baselines on real workflows and datasets.
more » « less
Full Text Available
Full-band Monte Carlo simulation of two-dimensional electron gas in (Al x Ga1− x )2O3/Ga2O3 heterostructures

https://doi.org/10.1063/5.0109577

Kumar, Avinash; Singisetti, Uttam (November 2022, Journal of Applied Physics)

β -Gallium oxide (Ga2O3) is an extensively investigated ultrawide-bandgap semiconductor for potential applications in power electronics and radio frequency switching. The room temperature bulk electron mobility (∼200cm2V−1s−1) is comparatively low and is limited by the 30 phonon modes originating from its 10-atom primitive cell. The theoretically calculated saturation velocity in bulk is 1–2×107cms−1 (comparable to GaN) and is limited by the low field mobility. This work explores the high field electron transport (and hence the velocity saturation) in the 2DEG based on the first principles calculated parameters. A self-consistent calculation on a given heterostructure design gives the confined eigenfunctions and eigenenergies. The intrasubband and the intersubband scattering rates are calculated based on the Fermi’s golden rule considering longitudinal optical (LO) phonon–plasmon screening. The high field characteristics are extracted from the full-band Monte Carlo simulation of heterostructures at 300 K. The overall system is divided into a 2D and a 3D region mimicking the electrons in the 2DEG and the bulk, respectively. The electron transport is treated through an integrated Monte Carlo program which outputs the steady state zone population, transient dynamics, and the velocity–field curves for a few heterostructure designs. The critical field for saturation does not change significantly from bulk values, however, an improved peak velocity is calculated at a higher 2DEG density. The velocity at low 2DEG densities is impacted by the antiscreening of LO phonons which plays an important role in shaping the zone population. A comparison with the experimental measurements is also carried out and possible origins of the discrepancies with experiments is discussed.
more » « less
Dynamic State Estimation for Inverter-Based Resources: A Control-Physics Dual Estimation Framework

https://doi.org/10.1109/TPWRS.2024.3362701

Huang, Heqing; Lin, Yuzhang; Lu, Xiaonan; Zhao, Yue; Kumar, Avinash (September 2024, IEEE Transactions on Power Systems)
Demonstration of collaborative and interactive workflow-based data analytics in texera

https://doi.org/10.14778/3554821.3554888

Liu, Xiaozhen; Wang, Zuozhi; Ni, Shengquan; Alsudais, Sadeem; Huang, Yicong; Kumar, Avinash; Li, Chen (August 2022, Proceedings of the VLDB Endowment)

Collaborative data analytics is becoming increasingly important due to the higher complexity of data science, more diverse skills from different disciplines, more common asynchronous schedules of team members, and the global trend of working remotely. In this demo we will show how Texera supports this emerging computing paradigm to achieve high productivity among collaborators with various backgrounds. Based on our active joint projects on the system, we use a scenario of social media analysis to show how a data science task can be conducted on a user friendly yet powerful platform by a multi-disciplinary team including domain scientists with limited coding skills and experienced machine learning experts. We will present how to do collaborative editing of a workflow and collaborative execution of the workflow in Texera. We will focus on data-centric features such as synchronization of operator schemas among the users during the construction phase, and monitoring and controlling the shared runtime during the execution phase.
more » « less
Full Text Available
Plasmon-phonon coupling in electrostatically gated β-Ga2O3 films with mobility exceeding 200 cm 2V-1s-1

https://doi.org/10.1021/acsnano.1c09535

Rajapitamahuni, Anil Kumar; Manjeshwar, Anusha Kamath; Kumar, Avinash; Datta, Animesh; Ranga, Praneeth; Thoutam, Laxman Raju; Krishnamoorthy, Sriram; Singisetti, Uttam; Jalan, Bharat (April 2022, ACS nano)

Full Text Available

« Prev Next »

Search for: All records